Check the DBM comment region in place, dropping the duplicate-comment substring#11737
Check the DBM comment region in place, dropping the duplicate-comment substring#11737dougqh wants to merge 29 commits into
Conversation
- fast replaceAll for a fixed string & replacement, 3x throughput compared to regex based solutions, 1/2x allocation compared to regex solutions - added SubSequence which provides a view into a subsequence of a String without incurring extra allocation - Strings.spliit returns an Iterable<SubSequence> can be used to do light weight processing of a String
These benchmarks exist to show why the APIs are forbidden
…race-java into dougqh/strings-improvements
…tion) Per bric3 review: the doc said what SubSequence avoids allocating but not why it matters. Explain the use case — allocation-free lightweight parsing: substring/ subSequence copy per call, so splitting a string into many pieces on a hot path allocates O(pieces) Strings; SubSequence is a zero-copy view. Also fixes a typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SharedDBCommenter.containsTraceComment built nine "KEY + =" strings on every call: the keys are encode()-derived (not compile-time constants), so each "KEY + =" is a runtime concat. A non-matching comment ran all nine checks and allocated nine throwaway Strings per call. Precompute the needles once at class init as *_EQ constants; the contains-chain is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Throughput benchmark at @threads(8) over a realistic comment-content mix (mostly non-DD, the all-nine-checks worst case) to surface the per-call allocation removed by hoisting the "<key>=" needles to constants. Run the parent commit (baseline) vs this branch and read the ops/s delta; -prof gc (gc.alloc.rate.norm) corroborates the mechanism. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tent substring SQLCommenter.hasDDComment materialized sql.substring(commentStart, commentEnd) on every duplicate-comment check just to feed containsTraceComment. Add a SharedDBCommenter.containsTraceComment(sql, from, to) range overload that scans the comment body in place, and a Strings.regionContains primitive it (and the String delegate) build on -- no per-call substring. The String overload now delegates to the range form, so Mongo behavior is unchanged. regionContains is the allocation-free, copy-free primitive; the natural-reading SubSequence.contains layer can delegate to it later. Boundary semantics unit- tested on Strings.regionContains; DB needle behavior on the overloads; existing SQLCommenter/SharedDBCommenter suites unchanged and green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o dougqh/dbcommenter-scan-overload # Conflicts: # internal-api/src/main/java/datadog/trace/util/Strings.java
…diom) Add SubSequence.contains (delegating to the Strings.regionContains primitive) and rewrite containsTraceComment(sql, from, to) as SubSequence.of(sql, from, to).contains(...) -- a 1-to-1 substitution for what you'd idiomatically write on a substring, with no copy. EA elides the view in the transient consumption; even if it doesn't, the string copy is still avoided. Couples this branch to the SubSequence base (#10640); regionContains stays as the allocation-free primitive the view delegates to. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Isolates the duplicate-comment guard (dbType=null skips the first-word scan; already-DD-commented SQL makes inject return early). The extractCommentContent substring allocated 140 B/op; the in-place range/view scan is EA-elided (~0). @threads(8), @fork(2), -prof gc; numbers in the class Javadoc. Throughput is flat-to-slightly-up but within @fork(2) noise -- this path is scan-CPU-dominated, so the win is the allocation, not throughput. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
Benchmark results (zulu-17, JDK 17.0.7,
|
| throughput | gc.alloc.rate.norm | |
|---|---|---|
| before (substring) | 23.5M ± 1.1M ops/s | 140 B/op |
| after (range/view) | 26.2M ± 1.5M ops/s | ~0 B/op (10⁻⁵) |
The extractCommentContent substring (140 B/op) is gone — the in-place range scan and the SubSequence view it flows through are both EA-elided. The headline win is the allocation: this path is dominated by the nine indexOf scans (CPU the alloc-removal doesn't touch), so throughput is a modest ~1.1× uplift, now resolved at @Fork(5). Benchmark Javadoc refreshed in bd0983f.
⛔ DO NOT REVIEW YET — blocked on #11734 and #10640
Draft placeholder. The diff is inflated: this stacks on two not-yet-merged PRs whose contents show up here —
containsTraceComment_EQhoist) — modifies the same method; this is its natural successor.SubSequence) — the view this consumes.Once both land on
masterand this is rebased, the diff reduces to this PR's real change. Please hold review until then.What this actually does (post-rebase)
Drops the per-call
sql.substring(...)thatSQLCommenter.hasDDCommentdid just to scan a comment for trace-comment needles. Adds:Strings.regionContains(s, from, to, needle)— an allocation-free, copy-free range check (the primitive).SubSequence.contains(needle)— the natural view method, delegating to the primitive.SharedDBCommenter.containsTraceComment(sql, from, to)— checks the comment body in place asSubSequence.of(sql, from, to).contains(...); theStringoverload (used by Mongo) delegates to it.SQLCommenterthen checks the comment region in place — no substring. Reads as a 1-to-1 substitution for the idiomaticStringform.Measured —
SQLCommenterDuplicateCommentBenchmark(JDK 25,@Threads(8),@Fork(2),-prof gc)gc.alloc.rate.normThe allocation delta (140 → 0 B/op) is the win and is exact/fork-stable. Throughput is indicative only —
@Fork(2)and the run wasn't on a quiet machine; this path is scan-CPU-dominated, so the win is the allocation, a small cut that compounds across comment-bearing injects. (Clean@Fork(5)re-run pending before ready.)🤖 Generated with Claude Code